Search CORE

69 research outputs found

Studies in biochemistry--steroid synthesis, P-450 enzymology and renal lithogenesis

Author: Selengut Jeremy D. (Jeremy David)
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/1994
Field of study

Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Chemistry, 1994.Includes bibliographical references.by Jeremy D. Selengut.Ph.D

DSpace@MIT

Sites Inferred by Metabolic Background Assertion Labeling (SIMBAL): adapting the Partial Phylogenetic Profiling algorithm to scan sequences for signatures that predict protein function

Author: Haft Daniel H
Rusch Douglas B
Selengut Jeremy D
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Springer - Publisher Connector

PubMed Central

A Guild of 45 CRISPR-Associated (Cas) Protein Families and Multiple CRISPR/Cas Subtypes Exist in Prokaryotic Genomes

Author: Daniel H Haft
Emmanuel F Mongodin
Jeremy Selengut
Jonathan A Eisen
Karen E Nelson
Publication venue: Public Library of Science
Publication date: 01/11/2005
Field of study

Clustered regularly interspaced short palindromic repeats (CRISPRs) are a family of DNA direct repeats found in many prokaryotic genomes. Repeats of 21–37 bp typically show weak dyad symmetry and are separated by regularly sized, nonrepetitive spacer sequences. Four CRISPR-associated (Cas) protein families, designated Cas1 to Cas4, are strictly associated with CRISPR elements and always occur near a repeat cluster. Some spacers originate from mobile genetic elements and are thought to confer “immunity” against the elements that harbor these sequences. In the present study, we have systematically investigated uncharacterized proteins encoded in the vicinity of these CRISPRs and found many additional protein families that are strictly associated with CRISPR loci across multiple prokaryotic species. Multiple sequence alignments and hidden Markov models have been built for 45 Cas protein families. These models identify family members with high sensitivity and selectivity and classify key regulators of development, DevR and DevS, in Myxococcus xanthus as Cas proteins. These identifications show that CRISPR/cas gene regions can be quite large, with up to 20 different, tandem-arranged cas genes next to a repeat cluster or filling the region between two repeat clusters. Distinctive subsets of the collection of Cas proteins recur in phylogenetically distant species and correlate with characteristic repeat periodicity. The analyses presented here support initial proposals of mobility of these units, along with the likelihood that loci of different subtypes interact with one another as well as with host cell defensive, replicative, and regulatory systems. It is evident from this analysis that CRISPR/cas loci are larger, more complex, and more heterogeneous than previously appreciated

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

TIGRFAMs and Genome Properties: tools for the assignment of molecular function and biological process in prokaryotic genomes

Author: Davidsen Tanja
Ganapathy Anurhada
Gwinn-Giglio Michelle
Haft Daniel H.
Nelson William C.
Richter Alexander R.
Selengut Jeremy D.
White Owen
Publication venue: Oxford University Press
Publication date: 06/12/2006
Field of study

TIGRFAMs is a collection of protein family definitions built to aid in high-throughput annotation of specific protein functions. Each family is based on a hidden Markov model (HMM), where both cutoff scores and membership in the seed alignment are chosen so that the HMMs can classify numerous proteins according to their specific molecular functions. Most TIGRFAMs models describe ‘equivalog’ families, where both orthology and lateral gene transfer may be part of the evolutionary history, but where a single molecular function has been conserved. The Genome Properties system contains a queriable set of metabolic reconstructions, genome metrics and extractions of information from the scientific literature. Its genome-by-genome assertions of whether or not specific structures, pathways or systems are present provide high-level conceptual descriptions of genomic content. These assertions enable comparative genomics, provide a meaningful biological context to aid in manual annotation, support assignments of Gene Ontology (GO) biological process terms and help validate HMM-based predictions of protein function. The Genome Properties system is particularly useful as a generator of phylogenetic profiles, through which new protein family functions may be discovered. The TIGRFAMs and Genome Properties systems can be accessed at and

CiteSeerX

Crossref

PubMed Central

eGenomics: Cataloguing Our Complete Genome Collection III

Author: Cochrane Guy
Field Dawn
Garrity George
Glöckner Frank Oliver
Gray Tanya
Kottmann Renzo
Lister Allyson L.
Selengut Jeremy
Sterk Peter
Tateno Yoshio
Tatusova Tatiana
Thomson Nick
Vaughan Robert
Publication venue: Hindawi Publishing Corporation
Publication date: 01/01/2007
Field of study

This meeting report summarizes the proceedings of the “eGenomics: Cataloguing our Complete Genome Collection III” workshop held September 11–13, 2006, at the National Institute for Environmental eScience (NIEeS), Cambridge, United Kingdom. This 3rd workshop of the Genomic Standards Consortium was divided into two parts. The first half of the three-day workshop was dedicated to reviewing the genomic diversity of our current and future genome and metagenome collection, and exploring linkages to a series of existing projects through formal presentations. The second half was dedicated to strategic discussions. Outcomes of the workshop include a revised “Minimum Information about a Genome Sequence” (MIGS) specification (v1.1), consensus on a variety of features to be added to the Genome Catalogue (GCat), agreement by several researchers to adopt MIGS for imminent genome publications, and an agreement by the EBI and NCBI to input their genome collections into GCat for the purpose of quantifying the amount of optional data already available (e.g., for geographic location coordinates) and working towards a single, global list of all public genomes and metagenomes

Directory of Open Access Journals

PubMed Central

MPG.PuRe

ProPhylo: partial phylogenetic profiling to guide protein family construction and assignment of biological process

Author: CJ Stubben
D Barker
D Barker
D Haft
D Szklarczyk
DA Rodionov
Daniel H Haft
DH Haft
DH Haft
DH Haft
EM Marcotte
F Eckstein
F Enault
GV Glazko
H-Y Ou
J Sun
J Wu
J-P Vert
JAG Ranea
JD Selengut
JD Selengut
JD Selengut
Jeremy D Selengut
L Ferrer
M Csurös
M Huynen
M Pellegrini
MA Huynen
Malay K Basu
MS Gelfand
P Pagel
PM Bowers
PR Kensche
PS Dehal
R Jothi
RL Tatusov
S Briesemeister
S Freilich
SR Eddy
SV Date
SV Date
T Blum
T Gaasterland
T Xu
T Yamada
X Brazzolotto
Y Hong
Y Liu
Y Zhou
Z Jiang
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Crossref

Springer - Publisher Connector

PubMed Central

InterPro: the integrative protein signature database

The InterPro database (http://www.ebi.ac.uk/interpro/) integrates together predictive models or ‘signatures' representing protein domains, families and functional sites from multiple, diverse source databases: Gene3D, PANTHER, Pfam, PIRSF, PRINTS, ProDom, PROSITE, SMART, SUPERFAMILY and TIGRFAMs. Integration is performed manually and approximately half of the total ∼58 000 signatures available in the source databases belong to an InterPro entry. Recently, we have started to also display the remaining un-integrated signatures via our web interface. Other developments include the provision of non-signature data, such as structural data, in new XML files on our FTP site, as well as the inclusion of matchless UniProtKB proteins in the existing match XML files. The web interface has been extended and now links out to the ADAN predicted protein-protein interaction database and the SPICE and Dasty viewers. The latest public release (v18.0) covers 79.8% of UniProtKB (v14.1) and consists of 16 549 entries. InterPro data may be accessed either via the web address above, via web services, by downloading files by anonymous FTP or by using the InterProScan search software (http://www.ebi.ac.uk/Tools/InterProScan/

RERO DOC Digital Library

Comparative Genomics of Emerging Human Ehrlichiosis Agents

Anaplasma (formerly Ehrlichia) phagocytophilum, Ehrlichia chaffeensis, and Neorickettsia (formerly Ehrlichia) sennetsu are intracellular vector-borne pathogens that cause human ehrlichiosis, an emerging infectious disease. We present the complete genome sequences of these organisms along with comparisons to other organisms in the Rickettsiales order. Ehrlichia spp. and Anaplasma spp. display a unique large expansion of immunodominant outer membrane proteins facilitating antigenic variation. All Rickettsiales have a diminished ability to synthesize amino acids compared to their closest free-living relatives. Unlike members of the Rickettsiaceae family, these pathogenic Anaplasmataceae are capable of making all major vitamins, cofactors, and nucleotides, which could confer a beneficial role in the invertebrate vector or the vertebrate host. Further analysis identified proteins potentially involved in vacuole confinement of the Anaplasmataceae, a life cycle involving a hematophagous vector, vertebrate pathogenesis, human pathogenesis, and lack of transovarial transmission. These discoveries provide significant insights into the biology of these obligate intracellular pathogens

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

InterPro, progress and status in 2005

InterPro, an integrated documentation resource of protein families, domains and functional sites, was created to integrate the major protein signature databases. Currently, it includes PROSITE, Pfam, PRINTS, ProDom, SMART, TIGRFAMs, PIRSF and SUPERFAMILY. Signatures are manually integrated into InterPro entries that are curated to provide biological and functional information. Annotation is provided in an abstract, Gene Ontology mapping and links to specialized databases. New features of InterPro include extended protein match views, taxonomic range information and protein 3D structure data. One of the new match views is the InterPro Domain Architecture view, which shows the domain composition of protein matches. Two new entry types were introduced to better describe InterPro entries: these are active site and binding site. PIRSF and the structure-based SUPERFAMILY are the latest member databases to join InterPro, and CATH and PANTHER are soon to be integrated. InterPro release 8.0 contains 11 007 entries, representing 2573 domains, 8166 families, 201 repeats, 26 active sites, 21 binding sites and 20 post-translational modification sites. InterPro covers over 78% of all proteins in the Swiss-Prot and TrEMBL components of UniProt. The database is available for text- and sequence-based searches via a webserver (http://www.ebi.ac.uk/interpro), and for download by anonymous FTP (ftp://ftp.ebi.ac.uk/pub/databases/interpro)

The University of Manchester - Institutional Repository

ProdInra

Hal-Diderot

Archive ouverte UNIGE

Infoscience - École polytechnique fédérale de Lausanne

CiteSeerX

Crossref

INRIA a CCSD electronic archive server

PubMed Central

Open Research Exeter

Oxford University Research Archive

MDC Repository